{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Project 5 - Goal 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's take a look at each file and make sure the first row contains the field names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin\n",
      "Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US\n",
      "\n",
      "-----------------\n",
      "ssn,first_name,last_name,gender,language\n",
      "100-53-9824,Sebastiano,Tester,Male,Icelandic\n",
      "\n",
      "-----------------\n"
     ]
    }
   ],
   "source": [
    "f_names = 'cars.csv', 'personal_info.csv'\n",
    "\n",
    "for f_name in f_names:\n",
    "    with open(f_name) as f:\n",
    "        print(next(f), end='')\n",
    "        print(next(f), end='')\n",
    "    print('\\n-----------------')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One thing I notice here is that the field names in the `cars.csv` file have uppercase letters - the second does not. I'm going to make those consistent by lowercasing the field names when I create the named tuple.\n",
    "\n",
    "The second thing I notice is that the delimiter is not the same for both files - one uses a `;`, the other uses a `,`. \n",
    "\n",
    "Fortunately, the `csv` module has a `Sniffer` class that we can use to try and deduce the delimiter from sampling some data. Alternatively, we could specify the delimiter to use as part of our conjtext manager's `__init__` method - but it would be nicer if we did not have to do that."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first see how that `Sniffer` class works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ";\n"
     ]
    }
   ],
   "source": [
    "import csv\n",
    "from itertools import islice\n",
    "\n",
    "with open('cars.csv') as f:\n",
    "    dialect = csv.Sniffer().sniff(f.read(1000))\n",
    "print(dialect.delimiter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we do this with our other file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ",\n"
     ]
    }
   ],
   "source": [
    "with open('personal_info.csv') as f:\n",
    "    dialect = csv.Sniffer().sniff(f.read(1000))\n",
    "print(dialect.delimiter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we'll use this to set the dialect for our csv parser.\n",
    "\n",
    "Let's create a small utility function to handle this for us:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def get_dialect(f_name):\n",
    "    with open(f_name) as f:\n",
    "        return csv.Sniffer().sniff(f.read(1000))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to create a context manager that, given just the file name:\n",
    "1. reads the header row to get the field names\n",
    "2. creates the appropriate named tuple\n",
    "3. uses the `csv.reader` to create an iterator over the data rows of the file\n",
    "4. returns that iterator from the `__enter__` method\n",
    "5. closes the file upon `__exit__`\n",
    "\n",
    "We're actually going to create a class that will be **both** the context manager and the iterator - so we'll implement both of these protocols."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from collections import namedtuple\n",
    "\n",
    "class FileParser:\n",
    "    def __init__(self, f_name):\n",
    "        self.f_name = f_name\n",
    "        \n",
    "    def __enter__(self):\n",
    "        self._f = open(self.f_name, 'r')\n",
    "        self._reader = csv.reader(self._f, get_dialect(self.f_name))\n",
    "        headers = map(lambda x: x.lower(), next(self._reader))\n",
    "        self._nt = namedtuple('Data', headers)\n",
    "        return self\n",
    "        \n",
    "    def __exit__(self, exc_type, exc_value, exc_tb):\n",
    "        self._f.close()\n",
    "        return False\n",
    "    \n",
    "    def __iter__(self):\n",
    "        return self\n",
    "    \n",
    "    def __next__(self):\n",
    "        if self._f.closed:\n",
    "            # file has been closed - so we're can't iterate anymore!\n",
    "            raise StopIteration\n",
    "        else:\n",
    "            return self._nt(*next(self._reader))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')\n",
      "Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')\n",
      "Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')\n",
      "Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')\n",
      "Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')\n",
      "Data(ssn='105-27-5541', first_name='Federico', last_name='Aggett', gender='Male', language='Chinese')\n",
      "Data(ssn='105-85-7486', first_name='Angelina', last_name='McAvey', gender='Female', language='Punjabi')\n",
      "Data(ssn='105-91-5022', first_name='Moselle', last_name='Apfel', gender='Female', language='Latvian')\n",
      "Data(ssn='105-91-7777', first_name='Audi', last_name='Roach', gender='Female', language='Estonian')\n",
      "Data(ssn='106-35-1938', first_name='Mackenzie', last_name='Nussey', gender='Male', language='Swedish')\n"
     ]
    }
   ],
   "source": [
    "from itertools import islice\n",
    "\n",
    "with FileParser('cars.csv') as data:\n",
    "    for row in islice(data, 10):\n",
    "        print(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And of course it should work equally well with the other file too:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')\n",
      "Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')\n",
      "Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')\n",
      "Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')\n",
      "Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')\n",
      "Data(ssn='105-27-5541', first_name='Federico', last_name='Aggett', gender='Male', language='Chinese')\n",
      "Data(ssn='105-85-7486', first_name='Angelina', last_name='McAvey', gender='Female', language='Punjabi')\n",
      "Data(ssn='105-91-5022', first_name='Moselle', last_name='Apfel', gender='Female', language='Latvian')\n",
      "Data(ssn='105-91-7777', first_name='Audi', last_name='Roach', gender='Female', language='Estonian')\n",
      "Data(ssn='106-35-1938', first_name='Mackenzie', last_name='Nussey', gender='Male', language='Swedish')\n"
     ]
    }
   ],
   "source": [
    "with FileParser('personal_info.csv') as data:\n",
    "    for row in islice(data, 10):\n",
    "        print(row)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
